IEEE Journal of Biomedical and Health Informatics
● Institute of Electrical and Electronics Engineers (IEEE)
Preprints posted in the last 30 days, ranked by how well they match IEEE Journal of Biomedical and Health Informatics's content profile, based on 34 papers previously published here. The average preprint has a 0.08% match score for this journal, so anything above that is already an above-average fit.
Sakurai, R.; Kojima, S.; Otake-Matsuura, M.; Kanoh, S.; Rutkowski, T. M.
Show abstract
Traditional psychiatric assessments for depression are often hindered by subjective bias and patient recall in-accuracy. This paper presents a multimodal passive Brain-Computer Interface (pBCI) designed for the objective screening of depressive traits through the end-to-end decoding of neural dynamics. We implemented a hybrid EEG-fNIRS framework to capture synchronized electro-hemodynamic responses during an emotional working memory (EWM) task. To classify sub-clinical depressive tendencies based on BDI-II scores, we utilized SincShallowNet, a deep learning architecture optimized for raw signal processing via learnable Sinc-filters. Our results demonstrate that the pBCI achieves peak performance in the auditory modality, with the integration of EEG and low-pass filtered fNIRS (0.15 Hz) yielding a balanced accuracy of 90.9% and an F1-score of 0.867. By isolating purely endogenous neural markers during the EWM maintenance phase, the system provides a robust "silent observer" for mental state monitoring. These findings validate the potential of multimodal pBCIs as high-precision, data-driven tools for early-stage depression screening, offering a scalable alternative to traditional clinical interviews and a foundation for longitudinal mental health monitoring.
Huang, X.; Hsieh, C.; Nguyen, Q.; Renteria, M. E.; Gharahkhani, P.
Show abstract
Wearable-derived physiological features have been associated with disease risk, but most current studies focus on single conditions, limiting understanding of cross-disease patterns. This study adopts a trans-diagnostic approach to examine whether wearable data capture shared and condition-specific physiological signatures across multiple chronic conditions spanning physical and mental health, and then evaluates the utility of these features for disease classification. A total of 9,301 patients with at least 21 days of consecutive FitBit data from the All of Us Controlled Tier Dataset version 8 were analyzed. Disease subcohorts included cardiovascular disease (CVD), diabetes, obstructive sleep apnea (OSA), major depressive disorder (MDD), anxiety, bipolar disorder, and attention-deficit/ hyperactivity disorder (ADHD), chosen based on prevalence and relevance. Logistic regression and XGBoost models were fitted for each disease subcohort versus the control cohort. We found that compared to using just baseline demographic and lifestyle features, incorporating wearable-derived features enabled improved classification performance in all subcohorts for both models, except for ADHD where improvement was mainly observed for ROC-AUC in logistic regression model likely due to the smaller sample size in ADHD subcohort. The largest performance gains were observed in MDD (increase in ROC-AUC of 0.077 for Logistic regression, 0.071 for XGBoost; p < 0.001) and anxiety (increase in ROC-AUC of 0.077 for logistic regression, 0.108 for XGBoost; p < 0.001). This study provides one of the first comprehensive transdiagnostic evaluations of wearable-derived features for disease classification, highlighting their potential to enhance risk stratification in the real-world setting as a practical complement to clinical assessments and providing a foundation to explore more fine-grained wearable data. Author summaryWearable devices such as fitness trackers and smartwatches are becoming increasingly popular and affordable, providing continuous measurements of heart rate, physical activity, and sleep. Alongside the growing digitization of health records, this creates new opportunities for large-scale, real-world health studies. In this study, we analyzed wearable-derived physiological patterns across a range of chronic conditions spanning both physical and mental health to better understand how these signals relate to disease risk. We found that incorporating wearable-derived heart rate, activity and sleep features improved disease risk classification across several conditions, with particularly strong gains for major depressive disorder and anxiety. By examining how individual features contributed to model predictions, we also identified meaningful associations between physiological signals and disease risk. For example, both duration and day-to-day variation of deep and rapid eye movement (REM) sleep were associated with increased risk in certain conditions. Our study supports the development of real-time, automated tools to assess disease risk alongside clinical care.
Chen, Z.; Wu, R.; Liu, Y.; Li, R.; Duprey, A.
Show abstract
The integration of Large Language Models into high-stakes clinical workflows is critically hampered by their lack of verifiable reliability and tendency to generate hallucinations. This paper introduces Med-ICE, an autonomous framework designed to enhance the reliability of LLMs for medical applications. Med-ICE adapts the Iterative Consensus Ensemble paradigm, enabling a group of peer LLM agents to collaboratively converge on a final answer through iterative rounds of generation and peer review, thereby eliminating the need for an external arbiter and its associated scalability bottleneck. Our work makes three key contributions: (1) a novel semantic consensus mechanism that determines agreement based on semantic similarity, crucial for nuanced clinical language; (2) demonstration of state-of-the-art performance, where Med-ICE significantly outperforms both direct single-LLM generation and the Self-Refinement technique on challenging medical benchmarks; and (3) a highly efficient and scalable architecture, as our Semantic Consensus Monitor is computationally lightweight. This research establishes a new standard for developing safer, more trustworthy LLM systems, paving the way for their responsible integration into medicine.
Ogretir, M.; Kaipainen, V.; Leskinen, M.; Lahdesmaki, H.; Koskinen, M.
Show abstract
Neonates requiring intensive care are at increased risk for long-term neuropsychiatric disorders. However, clinical adoption of risk prediction models remains limited when their performance lacks adequate interpretability for informed clinical decision-making. Here, we investigated whether longitudinal neonatal electronic health record (EHR) data from the first 90 days of life can support clinically meaningful interpretation of long-term risk signals for major neuropsychiatric diagnoses by age seven. In a retrospective register-based cohort of 17,655 at-risk children from an academic medical center, of whom 8.0\% (1,420) received a major neuropsychiatric diagnosis during follow-up, we applied a time-aware transformer model (Self-supervised Transformer for Time-Series; STraTS) and thoroughly evaluated its predictions using three complementary interpretability approaches: perturbation-based variable importance, value-dependent effect analysis, and leave-one-out (LOO) feature attribution. STraTS achieved the highest area under the precision--recall curve (AUPRC 0.171 {+/-} 0.022), compared with Random Forest (0.166 {+/-} 0.008), logistic regression (0.151 {+/-} 0.007), and XGBoost (0.128 {+/-} 0.010). Across interpretability methods, five predictors were consistently identified: birth weight, gender, Apgar score at 1 minute, umbilical serum thyroid stimulating hormone (uS-TSH), and treatment time in hospital. Indicators of early clinical severity, including chromosomal abnormalities and neonatal cerebral-status disturbances, showed the largest risk-increasing effects. Furthermore, the model's learned vector representations of subject-specific EHR sequences formed clinically coherent latent embeddings that reflect population heterogeneity along established perinatal risk dimensions. These findings demonstrate that combining multiple complementary interpretability methods yields stable, clinically plausible risk signals while revealing limitations that would remain undetected by any single approach, highlighting the importance of careful interpretability analysis of deep learning-based risk predictions.
Specht, B.; Tayeb, Z. Z.; Garbaya, S.; Khadraoui, D.; EL-Khozondar, M.; Schneider, R.
Show abstract
Accurate inference of physiological state across the menstrual cycle has important applications in reproductive health and in understanding symptom dynamics, yet most non-hormonal approaches rely on wearable sensors or calendar-based tracking. Whether self-reported symptoms alone can support prospective, cross-subject phase classification remains unresolved. Here, we introduce a hybrid modelling framework that combines a gradient-boosted classifier with a Hidden Semi-Markov Model to infer four menstrual cycle phases (menstrual, follicular, fertile, and luteal) from self-reported data. The classifier captures non-linear symptom patterns, while the temporal model imposes biologically grounded constraints, including cyclic ordering and realistic phase durations. In a leave-one-subject-out evaluation using hormonally annotated data from 41 participants, the model achieved 67.6\% accuracy and a macro F1 score of 0.662. Features reflecting short-term symptom variability were more informative than absolute symptom levels, indicating that within-person fluctuation provides a more generalisable signal of cycle phase than symptom intensity alone. These findings demonstrate the feasibility of low-burden, device-free menstrual health monitoring, establish symptom dynamics as a basis for scalable digital biomarkers, and expand access to tracking in resource-constrained settings and populations underserved by wearable-based approaches.
Rahjouei, A.
Show abstract
Actigraphy is widely used for long-term sleep monitoring, but established sleep-wake scoring algorithms often require parameter tuning, which is commonly performed manually and can reduce reproducibility. In this study, a grid-search-based calibration framework is presented for established actigraphy algorithms and evaluate whether it can serve as a practical alternative to manual tuning. The method was evaluated using two datasets: a multi-subject polysomnography-validated actigraphy dataset and a self-collected dual-device dataset. In the polysomnography-validated dataset, grid-search optimization produced performance patterns similar to manual parameter selection, while slightly improving detection of sleep onset and sleep offset and yielding modest gains in wake-sensitive metrics. In the dual-device dataset, consensus and majority voting were useful for reducing the influence of brief wake episodes occurring within the main sleep period, including micro-awakenings that can fragment sleep predictions across individual algorithms. Overall, these findings show that grid-search can replace manual parameter tuning with a more explicit and reproducible procedure while providing small improvements in sleep timing estimation and benefiting ensemble-based handling of within-sleep wakefulness.
Georgiou, G. P.; Paphiti, M.
Show abstract
Autism spectrum disorder (ASD) is a neurodevelopmental condition for which timely and accurate detection remains a major clinical priority. Early and reliable identification is important because it can facilitate access to assessment, diagnosis, and appropriate support; however, current diagnostic pathways still rely largely on behavioural evaluation and clinical judgement. In this context, machine-learning (ML) approaches have attracted growing interest because they can identify subtle and complex patterns in speech data that may not be easily captured through conventional methods. The current study capitalizes on this potential by developing and evaluating ML models for distinguishing autistic individuals from neurotypical individuals based on speech features. More specifically, acoustic features of vowels, including fundamental frequency (F0), first three formants (F1, F2, F3), duration, jitter, shimmer, harmonics-to-noise ratio (HNR), and intensity, were elicited from 18 autistic adults and 18 neurotypical adults through a controlled production task. Then, four supervised ML models were trained and evaluated on these features: LightGBM, Random Forest, Support Vector Machine, and XGBoost. All models demonstrated good classification performance, with the best-performing model achieving a strong discriminability of 89%. The explainability analysis identified F0 as the most influential predictor by a substantial margin, followed by intensity, F3, and F1, while duration, shimmer, HNR, jitter, and F2 contributed more modestly. These findings demonstrate that vowel acoustics contain clinically relevant information for distinguishing autistic from neurotypical adult speech and highlight the potential of interpretable, speech-based ML as a transparent and scalable aid for ASD screening and assessment.
Daya, N. R.; Wang, D.; Zhang, S.; Fang, M.; Wallace, A.; Zeger, S.; Selvin, E.
Show abstract
In this article, we present the cgmstats package for the analysis of continuous glucose monitoring (CGM) data. The use of wearable CGMs is growing rapidly. The latest generation of CGM systems do not require fingerstick calibration, are minimally invasive, and are frequently used in research studies. CGM sensors are typically worn for up to 2 weeks and record interstitial glucose measurements every minute to every 15 minutes, depending on the sensor used. CGM systems generate hundreds of measurements per day and thousands of measurements in one person over a single wear. There is a need for tools that allow researchers to efficiently organize and summarize the wealth of data on glucose patterns produced by CGM systems. The cgmstats package generates CGM summary measures for data from a variety of CGM systems and allows the user to flexibly define ranges and generate data visualizations. In this article, we provide an overview of the cgmstats package and examples of its use. The cgmstats package supports rigorous and reproducible analyses of CGM data.
Uus, A.; Fukami-Gartner, A.; Kyriakopoulou, V.; Cromb, D.; Morgan, T.; Arulkumaran, S.; Egloff Collado, A.; Luis, A.; Bos, R.; Makropoulos, A.; Schuh, A.; Robinson, E.; Sousa, H.; Deprez, M.; Cordero-Grande, L.; Bradshaw, C.; Colford, K.; Hutter, J.; Price, A.; O'Muircheartaigh, J.; Hammers, A.; Rueckert, D.; Counsell, S.; McAlonan, G.; Arichi, T.; Edwards, A. D.; Hajnal, J. V.; Rutherford, M. A.; Story, L.
Show abstract
Regional volumetric assessment of perinatal brain development is currently limited by the lack of consistent high quality multi-regional segmentation methods applicable to both fetal and neonatal MRI. We present Multi-BOUNTI, a deep learning pipeline for automated multi-lobe segmentation of fetal and neonatal T2w brain MRI. The method is based on a dedicated 43-label parcellation protocol and a 3D Attention U-Net trained on brain MRI datasets of subjects spanning 21-44 weeks gestational/postmenstrual age. The pipeline integrates preprocessing, segmentation and volumetric analysis, and was evaluated on independent datasets, demonstrating fast (< 10 min/case) and accurate performance with high agreement to manually refined labels. We demonstrate the application of the framework with 267 fetal and 593 neonatal MRI datasets from the developing Human Connectome Project without reported clinically significant brain anomalies to derive normative volumetric growth models across 21-44 weeks GA/PMA. These models were used to characterise developmental trajectories, assess differences between fetal and preterm neonatal cohorts, and analyse longitudinal changes. The resulting normative models were integrated into an automated reporting framework enabling subject-specific volumetric assessment via centiles and z-scores. Multi-BOUNTI provides a unified and scalable approach for perinatal brain segmentation and volumetry, supporting large-scale studies and facilitating future clinical translation. The full pipeline is publicly available at https://github.com/SVRTK/perinatal-brain-mri-analysis.
Bolpagni, M.; Pozza, M.; Gabrielli, S.
Show abstract
Chronic psychological stress contributes to allostatic load and is associated with cardiovascular, metabolic, and mental health disorders. Wearable devices enable continuous, noninvasive monitoring of autonomic signals such as heart rate variability (HRV), creating new opportunities for real-time stress assessment. Large language models (LLMs) are increasingly explored as interfaces for interpreting such data, but it remains unclear whether their predictions reflect physiologically meaningful patterns or rely on superficial heuristics. In this study, we assess whether LLM-derived stress predictions are physiologically coherent and how this varies with model scale. Using a longitudinal wearable dataset collected in naturalistic conditions (35 participants; 5,100 five-minute windows with HRV and contextual features), we obtained stress pseudoprobabilities from three models in the Mistral 3 family (675B, 14B, 3B) via zero-shot prompting. To make model behavior interpretable, we trained surrogate models to approximate LLM outputs and analyzed feature-response relationships using SHAP. Our results indicate that surrogate models closely reproduced LLM predictions (R{superscript 2} up to 0.915; Cohen's k up to 0.941), enabling high-fidelity characterization of decision patterns and providing a practical framework for auditing the physiological coherence of LLM-derived predictions. Physiological coherence increased with model scale: the largest model exhibited near complete alignment with established HRV stress responses, together with stable, predominantly monotonic feature effects and a balanced integration of physiological and contextual information. This pattern weakened at smaller scales, with the mid scale model showing partial alignment and the smallest model displaying reduced stability, greater feature concentration, and more irregular, non monotonic relationships. These findings indicate that larger LLMs encode more physiologically consistent representations of stress, whereas smaller models rely on simplified and less stable strategies, and highlight the value of surrogate based analysis as a practical framework for evaluating LLM behavior in biomedical applications and supporting their responsible integration into wearable health analytics.
Peimankar, A.; Hossein Motlagh, N.; K. Khare, S.; Spicher, N.; Dominguez, H.; Abolghasemi, V.; Fujiwara, K.; Teichmann, D.; Rahmani, R.; Puthusserypady, S.
Show abstract
Background: Atrial fibrillation (AFib) is the most common sustained arrhythmia in the world, imposing a heavy clinical and economic burden on global healthcare systems. Early detection of AFib can reduce mortality and morbidity, while helping to alleviate the growing economic burden of cardiovascular diseases. With the increasing availability of digital health technologies, computational solutions have great potential to support the timely diagnosis of cardiac abnormalities. Objectives: With the increasing availability of electrocardiogram (ECG) data from clinical and wearable devices, manual interpretation has become impractical due to its time-consuming and subjective nature. Existing automated approaches often rely on single classifiers or fixed ensembles that primarily optimize predictive accuracy while neglecting model diversity, which leads to limited robustness and generalization across heterogeneous datasets. Therefore, this study aims to develop a robust and diversity-aware framework for automatic AFib detection that simultaneously improves classification performance and model generalizability. To this end, we propose MOE-ECG, a multi-objective ensemble selection and fusion framework that explicitly optimizes both predictive performance and inter-model diversity for reliable AFib detection from ECG recordings. Methods: The proposed multi-objective ensemble (MOE) framework uses ensemble selection as a bi-objective optimization problem and employs multi-objective particle swarm optimization to identify complementary classifiers from a heterogeneous model pool. Unlike conventional ensembles, it explicitly optimizes both predictive performance and diversity and integrates Dempster-Shafer theory for uncertainty-aware decision fusion. After filtering the ECG signals to remove baseline wander and noise, they were segmented into windows of 20, 60, and 120 heartbeats with 50% overlap. The proposed approach was evaluated over five independent runs to assess its stability and generalization. Fifteen statistical and nonlinear features were obtained from the RR-intervals of the pre-processed ECG signals, of which eight features were selected with correlation analysis to capture subtle information from the ECG data. We trained and evaluated the performance of the proposed model in three open source databases, namely, the MIT-BIH Atrial Fibrillation Database, Saitama Heart Database Atrial Fibrillation, and Long-Term AF Database. Results: The proposed approach achieved the best overall performance on 60-beat segments, with an average accuracy of 89.85%, precision of 91.14%, recall of 94.19%, an F1-score of 92.64%, and area under the curve (AUC) of around 0.95. Statistical analysis using Holm-adjusted Wilcoxon tests confirmed significant improvements (p<0.05) compared to both the best individual classifier and the unoptimized average ensemble of all classifiers. These findings show that the proposed selection and evaluation methodology, rather than group aggregation alone, is the key driver of performance improvements. Conclusion: The results obtained demonstrate that the MOE-ECG model offers a robust, accurate, and reliable solution for the detection of AFib from short ECG segments. The empirical findings, in general, confirm that multi-objective ensemble fusion enhances diagnostic performance and offers robust predictions that will open up possibilities for real-time AFib detection in clinical and tele-health settings.
Dai, H.-J.; Mir, T. H.; Fang, L.-C.; Chen, C.-T.; Feng, H.-H.; Lai, J.-R.; Hsu, H.-C.; Nandy, P.; Panchal, O.; Liao, W.-H.; Tien, Y.-Z.; Chen, P.-Z.; Lin, Y.-R.; Jonnagaddala, J.
Show abstract
Accurate recognition and deidentification of sensitive health information (SHI) in spoken dialogues requires multimodal algorithms that can understand medical language and contextual nuance. However, the recognition and deidentification risks expose sensitive health information (SHI). Additionally, the variability and complexity of medical terminology, along with the inherent biases in medical datasets, further complicate this task. This study introduces the SREDH/AI-Cup 2025 Medical Speech Sensitive Information Recognition Challenge, which focuses on two tasks: Task-1: Speech transcription systems must accurately transcribe speech into text; and Task-2: Medical speech de-identification to detect and appropriately classify mentions of SHI. The competition attracted 246 teams; top-performing systems achieved a mixed error rate (MER) of 0.1147 and a macro F1-score of 0.7103, with average MER and macro F1-score of 0.3539 and 0.2696, respectively. Results were presented at the IW-DMRN workshop in 2025. Notably, the results reveal that LLMs were prevalent across both tasks: 97.5% of teams adopted LLMs for Task 1 and 100% for Task 2. Highlighting their growing role in healthcare. Furthermore, we finetuned six models, demonstrating strong precision ([~]0.885-0.889) with slightly lower recall ([~]0.830-0.847), resulting in F1-scores of 0.857-0.867.
Spyretos, C.; Tampu, I. E.; Lindblad, J.; Haj-Hosseini, N.
Show abstract
AO_SCPLOWBSTRACTC_SCPLOWThe classification of pediatric brain tumors is investigated using deep learning on hematoxylin and eosin (H&E) and antigen Ki-67 (Ki-67) whole slide images (WSIs) from the Childrens Brain Tumor Network (CBTN) dataset. A total of 1,662 unregistered WSIs (1,047 H&E and 615 Ki-67 images) were analyzed, including low-grade glioma/astrocytoma (grades 1, 2) (LGG), high-grade glioma/astrocytoma (grades 3, 4) (HGG), medulloblastoma (MB), ependymoma (EP) and ganglioglioma. The The aim of this study was to effectively classify pediatric brain tumors using H&E and Ki-67 WSIs individually, and to investigate whether early, intermediate, and late fusion could improve the predictive performance. From each WSI, 224x 224 pixel patches were extracted, and the instance (patch)-level features were obtained using the histology foundation model CONCHv1_5. The instances were aggregated using clustering-constrained attention multiple instance learning (CLAM) for patient-level classification. Model interpretability and explainability was assessed through attention heatmaps, cell density and Ki-67 labelling index (LI) maps. In the binary grade classification between LGG and HGG, the intermediate concatenation fusion achieved the best performance with a balanced accuracy of 0.88 {+/-} 0.05, (p < 0.005) compared to the single-stain models (H&E: 0.84 {+/-} 0.05, Ki-67: 0.86 {+/-} 0.05). For the 5-class tumor type classification, the one-hidden layer late fusion learning model achieved the highest balanced accuracy of 0.83 {+/-} 0.04 (p < 0.005), outperforming the single-stain models (H&E: 0.77 {+/-} 0.05, Ki-67: 0.74 {+/-} 0.05). Overall, most of the fusion approaches outperformed the single-stain models in both classification tasks (p < 0.005). The Ki-67 attention maps demonstrated moderate to strong Spearman correlation ({rho} = 0.576 - 0.823) with the cell density and Ki-67 LI maps, suggesting that these features are associated with the models predictions, although additional features may contribute. The results show that H&E and Ki-67 images provide complementary information, and most of the multi-stain fusion approaches using deep learning improve pediatric brain tumor diagnosis.
German Mesner, I.; Lake, D. E.; Kausch, S. L.; Krahn, K. N.; Gummadi, A.; Clark, T. W.; Niestroy, J. C.; Sahni, R.; Vesoulis, Z. A.; Gootenberg, D. B.; Ambalavanan, N.; Travers, C. P.; Fairchild, K. D.; Sullivan, B. A.
Show abstract
Premature very low birth weight (VLBW) infants have high rates of mortality and morbidity from sepsis, necrotizing enterocolitis, and respiratory failure requiring intubation and mechanical ventilation. Earlier detection of cardiorespiratory deterioration using vital signs from continuous physiological monitoring may lead to more timely interventions and improved outcomes. To further this research area, we present PreMo, a publicly available dataset of continuous heart rate and oxygen saturation, demographics, clinical events, and outcomes for 3,829 VLBW patients from four Neonatal Intensive Care Units (NICUs) in the United States. The PreMo dataset consists of a collection of parquet files, RO-Crate metadata, and sample usage code scripts hosted on the University of Virginia LibraData Dataverse website.
Liu, X.; Wen, X.; He, L.; Liu, X.; Gao, Y.; Guo, X.
Show abstract
BackgroundAdolescent major depressive disorder (AMDD) is a prevalent and heterogeneous psychiatric condition that emerges during a critical period of brain development. Neuroimaging-based biomarkers derived from resting-state functional magnetic resonance imaging (rs-fMRI) hold promise for objective diagnosis; however, pronounced inter-individual variability and limited sample sizes pose major challenges for robust model development. MethodsWe propose a memory-augmented Meta-Graph Convolutional Network (BrainMetaGCN) to classify AMDD using rs-fMRI functional connectivity. Individual functional connectivity matrices were constructed by parcellating rs-fMRI time series into cortical regions of interest and computing pairwise correlations. A meta-graph generator dynamically learned subject-specific graph structures, which were processed by lightweight graph convolutional layers. A memory neural network was incorporated to encode population-level prototypical connectivity patterns and generate individualized representations via attention-based retrieval. Model performance was evaluated across multiple independent datasets and compared with state-of-the-art deep learning approaches. Additionally, network interpretability was examined through cortical hierarchy analysis and functional enrichment of discriminative network components. ResultsThe proposed BrainMetaGCN consistently outperformed baseline models, including convolutional and transformer-based approaches, achieving higher accuracy, area under the receiver operating characteristic curve, sensitivity, and specificity. Memory-module-derived functional networks exhibited clear modular organization and showed a significant positive correlation with cortical functional hierarchy, supporting their neurobiological validity. Functional enrichment analyses implicated synaptic transmission, axon guidance, receptor tyrosine kinase signaling, and immune-related pathways, suggesting neurodevelopmental and neuroimmune mechanisms underlying AMDD. Ablation analyses confirmed that memory augmentation and dynamic meta-graph construction were critical for robust performance under small-sample conditions. ConclusionsThis study introduces a robust and interpretable memory-augmented graph learning framework for AMDD classification. By effectively balancing individual specificity and population-level generalization, BrainMetaGCN advances neuroimaging-based precision diagnosis and provides new insights into the neural and biological mechanisms of adolescent depression.
Hornak, G.; Heinolainen, A.; Solyomvari, K.; Silen, S.; Renkonen, R.; Koskinen, M.
Show abstract
Selecting an effective treatment relies on accurately anticipating patient's response to alternative interventions. However, forecasting longitudinal clinical trajectories remains difficult because electronic health records contain heterogeneous, irregularly sampled data over extended time periods. These issues are especially relevant for laboratory measurements, which are central for diagnostics, assessment of therapeutic responses, and tracking disease progression in routine clinical practice. However, existing deep learning methods for counterfactual prediction usually assume regularly sampled data, an assumption incompatible with the irregular, heterogeneous data-generation processes of real-world clinical practice. Here we present the Time-Aware G-Transformer, which integrates causal G-computation with time-aware attention to predict counterfactual outcomes on irregular data. By explicitly conditioning on the timing of future observations and encoding measurement patterns, the model captures temporal dynamics that previous methods overlook. Evaluated on synthetic tumor growth data and on 90,753 cancer patient trajectories from an academic medical center, our approach demonstrates superior long-horizon (> 1 day) prediction accuracy and uncertainty calibration compared to state-of-the-art baselines. These results demonstrate that embedding temporal relations directly into the attention mechanism enables robust integration of patient history data for evaluating potential treatment strategies in personalized medicine.
Lore, S.; Julihn, C.; Telfer, P.; Scheibye-Knudsen, M.; Verdin, E.
Show abstract
Biological brain aging is a major determinant of cognitive decline and neurodegenerative disease, yet scalable and intervention-ready brain aging biomarkers remain limited. Here, we develop an electroencephalography (EEG)-based brain age clock using machine learning trained on high-dimensional neural features across the adult lifespan. Using 643 features captured from a Sens.ai headset and controller device, the model predicts chronological age with high accuracy (Pearson r = 0.92; MAE = 4.43 years) and yields an interpretable set of age-informative neural features capturing functional signatures of brain aging. Unlike MRI-based approaches, this EEG-based clock is non-invasive, transportable, cost- effective and suitable for repeated at-home longitudinal measurement. Furthermore, in a longitudinal neuromodulation program, BrainYears-predicted brain age decreased by a mean of -5.18 years in the intervention group whereas a minimal-exposure comparison group showed no change on average (+0.07 years). Together, this work introduces a functional brain aging biomarker and an intervention-ready platform for quantifying brain age modulation.
Purkayastha, D. S.
Show abstract
Inadequate discharge communication is a well-documented contributor to medication non-adherence, missed follow-ups, and preventable readmissions across healthcare systems worldwide. In resource-limited oncology settings, where patients are often low-literate, speak non-dominant languages, and manage complex multi-drug regimens, this problem is acute and largely unaddressed. We present Aakhyan, a vernacular patient communication platform that addresses the full post-discharge arc: from converting English-language discharge summaries into structured, voice-based vernacular explanations, through medication adherence support, to proactive follow-up management - all delivered via WhatsApp. The architecture is novel in its strict separation of concerns: a vision-language model performs structured JSON extraction from discharge images; all patient-facing content is generated deterministically from clinician-approved templates with community-sensitive vocabulary registers. This design eliminates the hallucination risk inherent in generative AI patient communication (documented at 18-82% in prior studies) while preserving the extraction capability of large language models. The platform supports four language registers, Bengali, Hindi, simplified English for tribal populations, and Assamese, with text-to-speech synthesis across all registers, including a custom grapheme-to-phoneme engine developed for Assamese phonology. Beyond discharge communication, the platform includes scheduled medication adherence nudges, interactive follow-up reminders, and a Daily Availability and Patient Notification System (DAPNS) that notifies patients the evening before their follow-up whether their doctor and required investigations are available, preventing wasted trips by rural patients who travel 2-6 hours to reach the centre. A 100-patient stratified randomised controlled study is planned at Silchar Cancer Centre, with structured teach-back assessment at 48-72 hours post-discharge as the primary comprehension outcome and preliminary clinical efficacy as a secondary objective. This paper describes the clinical rationale, technical architecture, safety framework, and positioning of Aakhyan within the existing literature on mHealth patient communication interventions.
Amewudah, P.; Popescu, M.; Farmer, M. S.; Powell, K. R.
Show abstract
Background: Secure text messages (TMs) exchanged among interdisciplinary care teams in nursing homes (NHs) contain clinical information that aligns with the Age-Friendly Health Systems 4Ms: What Matters, Medication, Mentation, and Mobility, yet, this information is not captured in any structured form, making it unavailable for systematic monitoring or quality reporting. Automatically extracting 4M information accurately and efficiently from these messages could enable several downstream applications within long term care settings. This task, however, is challenging because of the fragmented syntax, brevity, abbreviations, and informality of TMs. Objective: This study aimed to develop and evaluate a multi-stage 4M Entity Recognition (4M-ER) pipeline that combines a fine-tuned token classifier with large language model (LLM) revision, using only locally deployed open-source models, to improve 4M information extraction from clinical TMs. Methods: We used an expert-annotated dataset of 1,169 TMs collected from interdisciplinary teams across 16 Midwest NHs. The pipeline first identifies candidate text spans using a fine-tuned Bio-ClinicalBERT token classifier. A semantic similarity retriever then selects in-context exemplars to guide an LLM revision in which the LLM (Gemma, Phi, Qwen, or Mistral) performs boundary correction, label evaluation, and selective acceptance or rejection of candidate spans. Baselines for comparison included single-stage zero-shot LLMs, single-stage fine-tuned Bio-ClinicalBERT, and a fine-tuned LLM (Gemma) from a prior study. Ablation studies assessed the contribution of each pipeline stage and the effect of message filtering. Robustness was evaluated across 5 repeated runs. Results: The 4M-ER pipeline outperformed the previously fine-tuned Gemma LLM across all 4M domains, achieving F1 (entity type) improvements of +2 to +11 percentage points without any additional fine-tuning and at roughly half the GPU memory (12 vs 24 GB). It also improved upon single-stage fine-tuned Bio-ClinicalBERT in Mobility, Mentation, and What Matters (+0.02 to +0.05 F1). Error analysis showed that LLM revision reduced false positives by 25% to 35% by correcting misclassifications caused by conversational ambiguity, while the fine-tuned Bio-ClinicalBERT's high recall captured subtle entities that the fine-tuned Gemma missed. Silver data augmentation further improved the hardest domains, raising What Matters F1 from 0.59 to 0.67 and Mobility from 0.64 to 0.67. Ablation studies confirmed that restricting LLMs to revision only yielded optimal accuracy and efficiency. Conclusions: The 4M-ER pipeline enables accurate and scalable extraction of 4M entities from clinical TMs by combining fine-tuned Bio-ClinicalBERT with LLM revision using only locally deployed open-source models. The structured 4M data produced by the pipeline can support 4M taxonomy and ontology construction, as demonstrated in the prior work, and provides a foundation for downstream applications including real-time clinical surveillance, compliance with emerging age-friendly quality measures, and predictive modeling in long-term care settings.
Furuichi, S.; Kohno, T.
Show abstract
The brain is believed to process information efficiently in a different manner from deep learning-based artificial intelligence (AI). Brain-like next-generation AI is gaining attention owing to its potential to perform human-like, highly adaptive, robust, and power-efficient computation. To realize such AI, one crucial approach is the bottom-up implementation of the neuronal systems, capturing their electrophysiological characteristics in electronic circuits. However, this neuromorphic approach generally focuses on simplified neuronal models that do not refer to many biological findings. Developing closer-to-brain models is a natural direction that serve as a fundamental computing model for next-generation AI. One of the constraints of neuromorphic circuits is the bit resolution of synaptic efficacy memory, as the memory footprint scales with it precision. Although low-resolution synaptic efficacy is essential for minimizing memory circuit footprint and energy consumption, it generally leads to performance degradation in many tasks such as the spatio-temporal spike pattern detection. This study proposed a closer-to-brain learning rule that incorporates heterosynaptic plasticity (HP) induced by glutamate spillover. It is demonstrated that our model mitigates the performance degradation associated with low-bit resolution synaptic efficacy, achieving the pattern detection success rate with 3-bit resolution synaptic efficacy, which is comparable to 64-bit floating-point precision. Furthermore, the findings of the study indicate that HP based model accelerates the convergence of the synaptic effcacy and effectively potentiates the synapses relevant to the pattern detection while suppressing irrelevant ones, thereby promoting a bimodal distribution of synaptic efficacies. These findings may provide a basic framework for constructing an energy-efficient, brain-like next-generation AI that maintains high performance under hardware constraints.